Skip to content

WIP/ERR: raise consistent errors for non-existent files #30264

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 10 commits into from

Conversation

jbrockmendel
Copy link
Member

xref #29102 the json case is xfailed here.

@simonjayhawkins simonjayhawkins added the Error Reporting Incorrect or improved errors from pandas label Dec 14, 2019
pytest.mark.xfail(
reason=(
"Needs to distinguish between a path to read "
"vs a string to parse"
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@WillAyd any thoughts on this? I think I saw a semi-recent discussion about trying to guess whether a string is supposed to be a path or actual content.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea this was a discussion with @datapythonista and @gfyoung . I don't have a strong point of view in any direction but would be against jumping through a lot of hoops to disambiguate.

I guess if we cared to disambiguate could probably just look for [, {, " or any leading number (ignoring whitespace) to start as those are the requirements for JSON

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would leave the JSON disambiguation alone for now. I'm still not really that inclined to change at the moment.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking through old issues, there are a lot that revolve around disambiguating paths vs content, e.g. #5924 for pickle. (one clever suggestion was to have pd.read_pickles, pd.read_jsons, etc as analogues to the stdlib json.loads)

This doesn't need to be resolved for this PR, but I think we're going to need to have a decision about whether/when/how to guess if a string is a path or not.

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we have to be really careful about this
as this basically reinventing the wheel and this raises lots of security concerns

@jbrockmendel
Copy link
Member Author

reinventing the wheel

If there's an existing wheel we can use, I'm all for that.

Going through the issues, I'm going to try to round up all of the "should we accept strings too?" etc issues so we can at least discuss them all at once.

Also happy to break parts of this off if anything can be identified as "easy"

@jbrockmendel
Copy link
Member Author

Closing, will re-implement in smaller pieces.

@jbrockmendel jbrockmendel deleted the errs-io2 branch November 20, 2021 23:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Error Reporting Incorrect or improved errors from pandas
Projects
None yet
5 participants